Skip to content

Conversation

@rhdedgar
Copy link
Collaborator

@rhdedgar rhdedgar commented Nov 7, 2025

To provide better support for disconnected environments, the operator's initContainers are being removed.

This PR removes the initContainer that performs CA Bundle concatenation, and now processes cert data from ConfigMaps directly before mounting the final result into the llama-stack container.

Validation steps are performed to ensure that only legitimate certificate data is being passed to the llama-stack process in the container. Other basic size /number validation is done to ensure the operator stays within its resource limits.

Startup script additions were avoided this time, so that intentions like RHAIENG-1438 can be carried out without additional logic overhead.

Supersedes #174
Closes RHAIENG-1383

@rhdedgar rhdedgar force-pushed the rm_CA_bundle_init_pr branch from ca9bb3a to ab2c259 Compare November 10, 2025 22:11
}

func (r *LlamaStackDistributionReconciler) processODHConfigMapKeys(configMap *corev1.ConfigMap, keys []string, collector *certificateCollector) error {
for _, key := range keys {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to explicitly have odh specific function here instead of just using the above function?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function was intended to be different from processConfigMapKeys in a couple ways.

For processConfigMapKeys, if the user specifies a ConfigMap that doesn't exist or a Key that has formatting issues, the deployment should fail until those issues are resolved by the user.

If the ODH bundle has some kind of issue, it may not be something that can be resolved by the user if they don't also manage the ODH instance. So the processODHConfigMapKeys function only warns them about it and then continues.


// Skip scaling test if PVC with ReadWriteOnce is used
// Most cloud providers (AWS EBS, Azure Disk, GCE PD) only support ReadWriteOnce for block storage
// ReadWriteMany requires network file systems (NFS, CephFS, etc.) which may not be available
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated change? Maybe be remove it or add it as a separate commit

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, this one was added to avoid errors encountered during testing. Now that the initContainer has been removed, the scale test is trying to attach the PVC to two separate containers at the same time, instead of after the initContainer finishes starting up.

If this PR merges, I'm planning on making a follow-up spike ticket to see if this race condition can be worked around somehow in a future commit.

"VLLM_TLS_VERIFY": controllers.CABundleMountPath,
"SSL_CERT_FILE": controllers.ManagedCABundleFilePath,
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I've updated this PR with a new commit to check for pods in the Ready status.

@rhdedgar rhdedgar force-pushed the rm_CA_bundle_init_pr branch from ab2c259 to 301d72a Compare November 13, 2025 23:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants